Phonetic annotation of a non-native speech corpus
نویسندگان
چکیده
Annotating non-native speech on a phonetic level is an extremely labour-intensive task and therefore requires a proper balance between the expected benefit and the resources needed. This paper reports on the experience gained when collecting and annotating a corpus of English sentences recorded by students with Italian and German as their mother tongue. The annotated data were used intensively during the development phase of a language learning tool, which allows automatic diagnosis of pronunciation errors and gives corrective feedback to the learner. Suggestions for further improvement in the annotation procedure will be presented, based on the experience acquired in creating the corpus.
منابع مشابه
Transcription and annotation of a Japanese accented spoken corpus of L 2 Spanish for the development of CAPT applications
This paper addresses the process of transcribing and annotating spontaneous non-native speech with the aim of compiling a training corpus for the development of Computer Assisted Pronunciation Training (CAPT) applications, enhanced with Automatic Speech Recognition (ASR) technology. To better adapt ASR technology to CAPT tools, the recognition systems must be trained with non-native corpora tra...
متن کاملiCALL corpus: Mandarin Chinese spoken by non-native speakers of European descent
We present iCALL, a speech corpus designed to evaluate Mandarin Chinese pronunciation patterns of non-native speakers of European descent, developed at the Institute for Infocomm Research (IR) in Singapore. To the best of our knowledge, iCALL is larger than any reported non-native corpora to date in terms of utterance number, duration, and number of speakers: iCALL consists of 90,841 utterances...
متن کاملA corpus-based analysis of transfer effects and connected speech processes in Vietnamese English
This paper presents a corpus-based descriptive analysis of the most prevalent transfer effects and connected speech processes observed in a comparison of 11 Vietnamese English speakers (6 females, 5 males) and 12 Australian English speakers (6 males, 6 females) over 24 grammatical paraphrase items. The phonetic processes are segmentally labelled in terms of IPA diacritic features using the EMU ...
متن کاملCoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech
This paper describes speech data recording, processing and annotation of a new speech corpus CoRuSS (Corpus of Russian Spontaneous Speech), which is based on connected communicative speech recorded from 60 native Russian male and female speakers of different age groups (from 16 to 77). Some Russian speech corpora available at the moment contain plain orthographic texts and provide some kind of ...
متن کاملAutomatic accentedness evaluation of non-native speech using phonetic and sub-phonetic posterior probabilities
Automatic evaluation of non-native speech accentedness has potential implications for not only language learning and accent identification systems but also for speaker and speech recognition systems. From the perspective of speech production, the two primary factors influencing the accentedness are the phonetic and prosodic structure. In this paper, we propose an approach for automatic accented...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000